DEbChIP User Guide
The goal of DEbChIP is to assist in the processing of RNA-seq data and integrating RNA-seq and ChIP-seq data.. It contains six functional modules: Quality Control (QC) on count matrix and samples, Principal Component Analysis (PCA) ,Differential Expression Gene Visualization (Volcano Plot, Scatter Plot, MA Plot, Rank Plot, Gene Plot, Heatmap), Functional Enrichment Analysis (GO enrichment analysis, KEGG enrichment analysis, Gene Set Enrichment Analysis), Integrate with ChIP-seq (Get consensus peaks, Peak profile plots, Peak annotaion, Integrate with RNA-seq, Integrate summary, GO enrichment on integrated results), Utils (useful functions when dealing with RNA-seq data, including gene name conversion and count normalization).
DEbChIP 0.1.0
DEbChIP is an R package distributed as part of the Bioconductor project and CRAN. To install the package, start R and enter:
# install via Bioconductor
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("DEbChIP")
# install via Github
# install.package("remotes") #In case you have not installed it.
remotes::install_github("showteeth/DEbChIP")
Once DEbChIP is installed, it can be loaded by the following command.
library("DEbChIP")
The goal of DEbChIP is to assist in the processing of RNA-seq data. It contains six functional modules:
To enhance the ease of use of the tool, we have also developed an interactive tool for DEbChIP that allows users to submit files to the web page and set parameters to get the desired results. Unlike the standalone R package, the interactive web page has built-in DESeq2 for differential expression analysis, while the R package can accept user input results from DESeq2 or edgeR, which will be more flexible.
By the way, all plots generated are publication-ready, and most of them are based on ggplot2, so that users can easily modify them according to their needs.
The data used here are RNA-seq data of the external granule layer in the cerebellum of control and conditional SnoN knockout mice, the raw data are stored in GSE120279, they are used in Robust principal component analysis for accurate outlier sample detection in RNA-seq data.
First, we create DESeqDataSet object with above dataset:
suppressWarnings(suppressMessages(library(DESeq2)))
suppressWarnings(suppressMessages(library(DEbChIP)))
count.file <- system.file("extdata", "snon_count.txt", package = "DEbChIP")
meta.file <- system.file("extdata", "snon_meta.txt", package = "DEbChIP")
count.matrix <- read.table(file = count.file, header = T, sep = "\t")
head(count.matrix)
## KO1 KO2 KO3 KO4 KO5 KO6 WT1 WT2 WT3 WT4 WT5
## ENSMUSG00000000001.4 4556 4218 3835 3718 5741 3875 4115 5074 2931 4374 5118
## ENSMUSG00000000003.15 0 0 0 0 0 0 0 0 0 0 0
## ENSMUSG00000000028.14 350 579 435 316 432 317 245 621 419 506 545
## ENSMUSG00000000031.16 268 804 66 207 46 66 336 112 60 69 137
## ENSMUSG00000000037.16 262 157 184 162 301 233 311 176 94 139 229
## ENSMUSG00000000049.11 0 0 0 0 0 0 1 0 2 0 1
## WT6
## ENSMUSG00000000001.4 3625
## ENSMUSG00000000003.15 0
## ENSMUSG00000000028.14 371
## ENSMUSG00000000031.16 193
## ENSMUSG00000000037.16 186
## ENSMUSG00000000049.11 0
meta.info <- read.table(file = meta.file, header = T)
head(meta.info)
## condition
## KO1 KO
## KO2 KO
## KO3 KO
## KO4 KO
## KO5 KO
## KO6 KO
dds <- DESeq2::DESeqDataSetFromMatrix(countData = count.matrix, colData = meta.info, design = ~condition)
dds
## class: DESeqDataSet
## dim: 53379 12
## metadata(1): version
## assays(1): counts
## rownames(53379): ENSMUSG00000000001.4 ENSMUSG00000000003.15 ...
## ENSMUSG00000115849.1 ENSMUSG00000115850.1
## rowData names(0):
## colnames(12): KO1 KO2 ... WT5 WT6
## colData names(1): condition
Quality control are of vital importance to get realiable results. Here, quality control includes two main aspects: sequencing saturation (QC on count matrix) and sample similarity (QC on samples).
This plot shows proportion of genes detected in different samples at different CPM thresholds. Theoretically, all samples should be similarly distributed, so as to avoid false positive results obtained from detection problems.
CountQC(deobj = dds, group.key = "condition", type = "cpm")
## Differential expression analysis with DESeq2!
## [1] "Warning: 25096 features with 0 counts in all samples are to be removed for this analysis."
## [1] "Count distributions are to be computed for:"
## [1] "KO1" "KO2" "KO3" "KO4" "KO5" "KO6" "WT1" "WT2" "WT3" "WT4" "WT5" "WT6"
This plot shows the proportion of genes detected at different sequencing depths. If the proportion is too low, it indicates that many genes are not detected and therefore some important information may be missing, and increasing the sequencing volume can effectively solve this problem.
CountQC(deobj = dds, group.key = "condition", type = "saturation")
## Differential expression analysis with DESeq2!
QC on samples including four aspects:
SampleRelation(deobj = dds,transform.method = "rlog",anno.key = "condition")
## Differential expression analysis with DESeq2!
qc.pca.res=PCA(deobj = dds,transform.method = "rlog")
## Differential expression analysis with DESeq2!
## Use all genes for PCA!
PCAtools::biplot(qc.pca.res,x="PC1",y="PC2",colby="condition",legendPosition="bottom")
outlier.res=OutlierDetection(deobj = dds,var.genes = NULL,transform.method = "rlog")
## Differential expression analysis with DESeq2!
## Use all genes for PCA!
## Detecting 2 outlier(s): KO3,WT1
outlier.res$outlier
## [1] "KO3" "WT1"
outlier.res$plot
For batch correction and outlier detection, you can use the following codes:
# batch effect correction
# this is a demo scripts
batch.res=QCPCA(deobj = dds,var.genes = NULL,remove.sample=NULL,transform.method = "rlog",batch = "cell",
colby = "dex", outlier.detection = F)
batch.res$plot
# outlier detection
outlier.res=QCPCA(deobj = dds,var.genes = NULL,remove.sample=NULL,transform.method = "rlog",
outlier.detection = T,rpca.method = "PcaGrid")
outlier.res$plot
# the final dds
dds=outlier.res$deobj # or outlier.res$plot
# conduct PCA
pca.res=PCA(deobj = dds,remove.sample = outlier.res$outlier,transform.method = "rlog")
## Differential expression analysis with DESeq2!
## Use all genes for PCA!
# get basic plots
basic.plots=PCABasic(pca.res,colby="condition",legend.pos = "right")
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
basic.plots[["screen"]]
basic.plots[["biplot"]]
basic.plots[["pairs"]]
PCA3D(pca = pca.res,color.key = "condition",main = "3D PCA")
First you should get differential expression genes. For DESeq2 results:
# set control level
dds$condition <- relevel(dds$condition, ref = "WT")
# conduct differential expressed genes analysis
dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
# extract results
dds.results <- results(dds,contrast=c("condition",'KO','WT'))
dds.results.ordered <- dds.results[order(dds.results$log2FoldChange,decreasing = TRUE),]
head(dds.results.ordered)
## log2 fold change (MLE): condition KO vs WT
## Wald test p-value: condition KO vs WT
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000025934.15 2.01093 4.46030 3.01742 1.47818 NA
## ENSMUSG00000105597.1 1.64376 4.16931 2.98872 1.39502 0.163011113
## ENSMUSG00000109866.1 2.32064 4.14573 1.22504 3.38415 0.000713987
## ENSMUSG00000085819.7 2.23441 4.07496 1.65939 2.45569 0.014061354
## ENSMUSG00000102474.1 1.52867 4.06781 1.80781 2.25013 0.024440396
## ENSMUSG00000042677.7 1.49781 4.05394 1.71350 2.36588 0.017987495
## padj
## <numeric>
## ENSMUSG00000025934.15 NA
## ENSMUSG00000105597.1 0.99973
## ENSMUSG00000109866.1 0.99973
## ENSMUSG00000085819.7 0.99973
## ENSMUSG00000102474.1 0.99973
## ENSMUSG00000042677.7 0.99973
For edgeR results:
library(edgeR)
## Loading required package: limma
##
## Attaching package: 'limma'
## The following object is masked from 'package:DESeq2':
##
## plotMA
## The following object is masked from 'package:BiocGenerics':
##
## plotMA
snon.edgeR=DGEList(counts=count.matrix, group=meta.info$condition)
keep <- filterByExpr(snon.edgeR,min.count=10)
snon.edgeR <- snon.edgeR[keep, , keep.lib.sizes=FALSE]
snon.edgeR <- calcNormFactors(snon.edgeR)
snon.edgeR$samples$group <- relevel(snon.edgeR$samples$group, ref="WT")
design <- model.matrix(~snon.edgeR$samples$group)
snon.edgeR <- estimateDisp(snon.edgeR, design)
fit <- glmQLFit(snon.edgeR, design)
qlf <- glmQLFTest(fit, coef=2)
all.res <- topTags(qlf,n=nrow(snon.edgeR$counts))$table
head(all.res)
## logFC logCPM F PValue FDR
## ENSMUSG00000039620.17 2.1925630 1.7076072 70.78781 2.686429e-07 0.003577786
## ENSMUSG00000046152.16 -0.7238661 4.9567684 36.89615 1.550592e-05 0.103253915
## ENSMUSG00000022376.8 -1.7356149 0.3243428 21.61086 2.617267e-04 0.998523400
## ENSMUSG00000027660.16 -0.9581197 5.3679317 19.98078 3.791045e-04 0.998523400
## ENSMUSG00000031478.16 0.8734604 2.4708238 18.99645 4.783255e-04 0.998523400
## ENSMUSG00000079469.10 -1.2129059 1.4296852 18.36878 5.568020e-04 0.998523400
# VolcanoPlot for DESeq2
VolcanoPlot(dds.results.ordered,signif="pvalue",l2fc.threashold=0.3,label.num=2,
point.alpha = 0.8, label.color=c("purple","green"),tick.trans = NULL)
## Differential expression analysis with DESeq2!
# VolcanoPlot for edgeR
VolcanoPlot(all.res,signif="PValue",l2fc.threashold=0.3,label.num=2,point.alpha = 0.8,
label.color=c("purple","green"),tick.trans = NULL)
## Differential expression analysis with edgeR!
# ScatterPlot for DESeq2
ScatterPlot(deobj = dds,deres = dds.results.ordered,group.key = "condition",
ref.group = "WT",signif="pvalue",l2fc.threashold=0.3,label.num = 2,
point.alpha = 0.8,label.color=c("purple","green"))
## Differential expression analysis with DESeq2!
## Loading required package: ggplot2
# ScatterPlot for edgeR
ScatterPlot(deobj = snon.edgeR,deres = all.res,group.key = "condition",
ref.group = "WT",signif="PValue",l2fc.threashold=0.3,label.num = 2,
point.alpha = 0.8,label.color=c("purple","green"))
## Differential expression analysis with edgeR!
# MAPlot for DESeq2
MAPlot(dds.results.ordered,signif="pvalue",l2fc.threashold=0.3,label.num=2,
point.alpha = 0.8, label.color=c("purple","green"))
## Differential expression analysis with DESeq2!
# MAPlot for edgeR
MAPlot(all.res,signif="PValue",l2fc.threashold=0.3,label.num=2,point.alpha = 0.8,
label.color=c("purple","green"))
## Differential expression analysis with edgeR!
# RankPlot for DESeq2
RankPlot(dds.results.ordered,signif="pvalue",l2fc.threashold=0.3,label.num=2,
point.alpha = 0.8, label.color=c("purple","green"))
## Differential expression analysis with DESeq2!
# RankPlot for edgeR
RankPlot(all.res,signif="PValue",l2fc.threashold=0.3,label.num=2,point.alpha = 0.8,
label.color=c("purple","green"))
## Differential expression analysis with edgeR!
# GenePlot for DESeq2
GenePlot(deobj = dds,deres = dds.results.ordered,group.key = "condition",
ref.group = "WT",fill.color=c("red","blue"), fill.alpha = 0.8,
gene.num =2,signif="pvalue",l2fc.threashold=0.3)
## Differential expression analysis with DESeq2!
# GenePlot for edgeR
GenePlot(deobj = snon.edgeR,deres = all.res,group.key = "condition",
ref.group = "WT",fill.color=c("red","blue"),fill.alpha = 0.8,
gene.num =2,signif="PValue",l2fc.threashold=0.3)
## Differential expression analysis with edgeR!
# DEHeatmap for DESeq2
DEHeatmap(deobj = dds,deres = dds.results.ordered,group.key = "condition",
ref.group = "WT", signif="pvalue",l2fc.threashold=0.3)
## Differential expression analysis with DESeq2!
# DEHeatmap for edgeR
DEHeatmap(deobj = snon.edgeR,deres = all.res,group.key = "condition",
ref.group = "WT", signif="PValue",l2fc.threashold=0.3)
## Differential expression analysis with edgeR!
# save results to working directory
# ConductFE(deres = dds.results.ordered, signif = "pvalue", l2fc.threashold = 0.3,enrich.type = "GO")
all.go.results=ConductFE(deres = dds.results.ordered, signif = "pvalue", l2fc.threashold = 0.3,enrich.type = "GO",str.width = 50, save = F)
## Differential expression analysis with DESeq2!
## Convert ENSEMBL to ENTREZID!
## 'select()' returned 1:many mapping between keys and columns
## conduct ALL GO enrichment analysis on: UP
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Convert ENSEMBL to ENTREZID!
## 'select()' returned 1:1 mapping between keys and columns
## conduct ALL GO enrichment analysis on: DOWN
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
up.go.res=all.go.results[["UP"]][["GO"]]
up.go.res.table=up.go.res[["table"]]
head(up.go.res.table)
## ONTOLOGY ID Description
## GO:0097060 CC GO:0097060 synaptic membrane
## GO:0099240 CC GO:0099240 intrinsic component of synaptic membrane
## GO:0014069 CC GO:0014069 postsynaptic density
## GO:0032279 CC GO:0032279 asymmetric synapse
## GO:0098839 CC GO:0098839 postsynaptic density membrane
## GO:0098984 CC GO:0098984 neuron to neuron synapse
## GeneRatio BgRatio pvalue p.adjust qvalue
## GO:0097060 15/231 470/23271 7.783967e-05 0.01822125 0.01447238
## GO:0099240 10/231 244/23271 1.732798e-04 0.01822125 0.01447238
## GO:0014069 13/231 400/23271 1.957825e-04 0.01822125 0.01447238
## GO:0032279 13/231 405/23271 2.208636e-04 0.01822125 0.01447238
## GO:0098839 6/231 97/23271 4.175021e-04 0.01868255 0.01483877
## GO:0098984 13/231 433/23271 4.183638e-04 0.01868255 0.01483877
## geneID
## GO:0097060 Slc6a2/Met/Nt5e/Lpar2/Grik1/Tenm2/Tiam1/Abhd6/Rgs7bp/Epha7/Snap25/Atp2b2/Adcy1/Adgrl2/Cacng2
## GO:0099240 Lpar2/Grik1/Abhd6/Rgs7bp/Epha7/Snap25/Atp2b2/Adcy1/Adgrl2/Cacng2
## GO:0014069 Nefh/Met/Grik1/Rapgef4/Tiam1/Rgs7bp/Epha7/Ctnnd2/Map1b/Atp2b2/Pak2/Adcy1/Cacng2
## GO:0032279 Nefh/Met/Grik1/Rapgef4/Tiam1/Rgs7bp/Epha7/Ctnnd2/Map1b/Atp2b2/Pak2/Adcy1/Cacng2
## GO:0098839 Tiam1/Rgs7bp/Epha7/Atp2b2/Adcy1/Cacng2
## GO:0098984 Nefh/Met/Grik1/Rapgef4/Tiam1/Rgs7bp/Epha7/Ctnnd2/Map1b/Atp2b2/Pak2/Adcy1/Cacng2
## Count
## GO:0097060 15
## GO:0099240 10
## GO:0014069 13
## GO:0032279 13
## GO:0098839 6
## GO:0098984 13
up.go.res[["plot"]]
down.go.res=all.go.results[["DOWN"]][["GO"]]
down.go.res.table = down.go.res[["table"]]
head(down.go.res.table)
## ONTOLOGY ID Description GeneRatio BgRatio
## GO:0005604 CC GO:0005604 basement membrane 7/229 114/23271
## pvalue p.adjust qvalue
## GO:0005604 0.0001374573 0.04178703 0.04007966
## geneID Count
## GO:0005604 Ntn4/Ntn1/Col4a2/Egflam/Papln/Fras1/Col4a4 7
down.go.res[["plot"]]
# save results to working directory
# ConductFE(deres = dds.results.ordered, signif = "pvalue", l2fc.threashold = 0.3,enrich.type = "KEGG")
all.kegg.results=ConductFE(deres = dds.results.ordered, signif = "pvalue", l2fc.threashold = 0.3,enrich.type = "KEGG", str.width = 50, save = F)
## Differential expression analysis with DESeq2!
## Convert ENSEMBL to ENTREZID!
## 'select()' returned 1:many mapping between keys and columns
## conduct KEGG enrichment analysis.
## Reading KEGG annotation online:
##
## Reading KEGG annotation online:
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Convert ENSEMBL to ENTREZID!
## 'select()' returned 1:1 mapping between keys and columns
## conduct KEGG enrichment analysis.
up.kegg.res=all.kegg.results[["UP"]][["KEGG"]]
up.kegg.res.table=up.kegg.res[["table"]]
head(up.kegg.res.table)
## ID Description GeneRatio BgRatio pvalue
## mmu04911 mmu04911 Insulin secretion 7/100 86/8995 4.452578e-05
## mmu04024 mmu04024 cAMP signaling pathway 10/100 220/8995 1.580727e-04
## p.adjust qvalue
## mmu04911 0.009751146 0.008623941
## mmu04024 0.017308964 0.015308096
## geneID Count
## mmu04911 Kcnmb2/Kcnn3/Creb5/Rapgef4/Creb3l2/Snap25/Adcy1 7
## mmu04024 Cftr/Atp2a3/Creb5/Rapgef4/Tiam1/Creb3l2/Sox9/Atp2b2/Crebbp/Adcy1 10
up.kegg.res[["plot"]]
down.kegg.res=all.kegg.results[["DOWN"]][["KEGG"]]
down.kegg.res.table=down.kegg.res[["table"]]
head(down.kegg.res.table)
## [1] ID Description GeneRatio BgRatio pvalue p.adjust
## [7] qvalue geneID Count
## <0 rows> (or 0-length row.names)
down.kegg.res[["plot"]]
For human, we can download gmt file from MSigDB. For other species, we can obtain gene sets via msigdbr.
library(msigdbr)
# list all possible species
msigdbr_species()
## # A tibble: 20 x 2
## species_name species_common_name
## <chr> <chr>
## 1 Anolis carolinensis Carolina anole, green anole
## 2 Bos taurus bovine, cattle, cow, dairy cow, domestic cattle,…
## 3 Caenorhabditis elegans roundworm
## 4 Canis lupus familiaris dog, dogs
## 5 Danio rerio leopard danio, zebra danio, zebra fish, zebrafish
## 6 Drosophila melanogaster fruit fly
## 7 Equus caballus domestic horse, equine, horse
## 8 Felis catus cat, cats, domestic cat
## 9 Gallus gallus bantam, chicken, chickens, Gallus domesticus
## 10 Homo sapiens human
## 11 Macaca mulatta rhesus macaque, rhesus macaques, Rhesus monkey, …
## 12 Monodelphis domestica gray short-tailed opossum
## 13 Mus musculus house mouse, mouse
## 14 Ornithorhynchus anatinus duck-billed platypus, duckbill platypus, platypus
## 15 Pan troglodytes chimpanzee
## 16 Rattus norvegicus brown rat, Norway rat, rat, rats
## 17 Saccharomyces cerevisiae baker's yeast, brewer's yeast, S. cerevisiae
## 18 Schizosaccharomyces pombe … <NA>
## 19 Sus scrofa pig, pigs, swine, wild boar
## 20 Xenopus tropicalis tropical clawed frog, western clawed frog
# create gene sets
m_t2g <- msigdbr(species = "Mus musculus", category = "C5", subcategory = "BP") %>%
dplyr::select(gs_name, entrez_gene)
head(m_t2g)
## # A tibble: 6 x 2
## gs_name entrez_gene
## <chr> <int>
## 1 GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS 67618
## 2 GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS 107747
## 3 GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS 216188
## 4 GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS 108156
## 5 GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS 270685
## 6 GOBP_10_FORMYLTETRAHYDROFOLATE_METABOLIC_PROCESS 665563
gsea.results=ConductGSEA(deres = dds.results.ordered,gmt.file = NULL,gene.sets = m_t2g,save = F)
## Differential expression analysis with DESeq2!
## Convert ENSEMBL to ENTREZID!
## 'select()' returned 1:many mapping between keys and columns
## conduct GSEA anaysis.
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
gsea.results.table=gsea.results[["table"]]
head(gsea.results.table)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
gsea.results[["plot"]]
To enhance the usability of the tool, we create a all-in-one analysis command, all results will be saved in specific directory:
library(msigdbr)
m_t2g <- msigdbr(species = "Mus musculus", category = "C5") %>%
dplyr::select(gs_name, entrez_gene)
count.file <- system.file("extdata", "snon_count.txt", package = "DEbChIP")
meta.file <- system.file("extdata", "snon_meta.txt", package = "DEbChIP")
ConductDESeq2(count.matrix.file = count.file, meta.file = meta.file, out.folder="/home/songyabing/R/learn/tmp/dekit",
signif = "pvalue", l2fc.threashold = 0.3,
group.key = "condition", ref.group = "WT",gmt.file = NULL, gene.sets = m_t2g)
The structrue of results folder:
tree /home/songyabing/R/learn/tmp/dekit
## /home/songyabing/R/learn/tmp/dekit
## ├── CountQC_CPM.pdf
## ├── CountQC_saturation.pdf
## ├── DEG
## │ ├── Condition_KO_WT_all.csv
## │ ├── Condition_KO_WT_pvalue0.05_FC0.3.csv
## │ ├── DEG_GenePlot.pdf
## │ ├── DEG_Heatmap.pdf
## │ ├── DEG_MAPlot.pdf
## │ ├── DEG_RankPlot.pdf
## │ ├── DEG_ScatterPlot.pdf
## │ ├── DEG_VolcanoPlot.pdf
## │ └── normalized_counts.csv
## ├── DEkit_all_in_one.RData
## ├── FE
## │ ├── DOWN_ALL_GO.csv
## │ ├── DOWN_Biological_Process.png
## │ ├── DOWN_Cellular_Component.png
## │ ├── DOWN_Functional_Enrichment.pdf
## │ ├── DOWN_KEGG.csv
## │ ├── DOWN_KEGG_Enrichment.png
## │ ├── DOWN_Molecular_Function.png
## │ ├── UP_ALL_GO.csv
## │ ├── UP_Biological_Process.png
## │ ├── UP_Cellular_Component.png
## │ ├── UP_Functional_Enrichment.pdf
## │ ├── UP_KEGG.csv
## │ ├── UP_KEGG_Enrichment.png
## │ └── UP_Molecular_Function.png
## ├── GSEA
## │ ├── GSEA_enrich_result.csv
## │ ├── GSEA_enrich_result.pdf
## │ └── GSEA_enrich_result.png
## ├── PCA
## │ ├── PC1
## │ │ ├── Negative
## │ │ │ ├── PC1_Negative_ALL_GO.csv
## │ │ │ └── PC1_Negative_KEGG.csv
## │ │ └── Positive
## │ │ ├── PC1_Positive_ALL_GO.csv
## │ │ └── PC1_Positive_KEGG.csv
## │ ├── PC1_Negative_Biological_Process.png
## │ ├── PC1_Negative_Cellular_Component.png
## │ ├── PC1_Negative_Functional_Enrichment.pdf
## │ ├── PC1_Negative_KEGG_Enrichment.png
## │ ├── PC1_Negative_Molecular_Function.png
## │ ├── PC1_Positive_Biological_Process.png
## │ ├── PC1_Positive_Cellular_Component.png
## │ ├── PC1_Positive_Functional_Enrichment.pdf
## │ ├── PC1_Positive_KEGG_Enrichment.png
## │ ├── PC1_Positive_Molecular_Function.png
## │ ├── PC2
## │ │ ├── Negative
## │ │ │ ├── PC2_Negative_ALL_GO.csv
## │ │ │ └── PC2_Negative_KEGG.csv
## │ │ └── Positive
## │ │ ├── PC2_Positive_ALL_GO.csv
## │ │ └── PC2_Positive_KEGG.csv
## │ ├── PC2_Negative_Biological_Process.png
## │ ├── PC2_Negative_Cellular_Component.png
## │ ├── PC2_Negative_Functional_Enrichment.pdf
## │ ├── PC2_Negative_KEGG_Enrichment.png
## │ ├── PC2_Negative_Molecular_Function.png
## │ ├── PC2_Positive_Biological_Process.png
## │ ├── PC2_Positive_Cellular_Component.png
## │ ├── PC2_Positive_Functional_Enrichment.pdf
## │ ├── PC2_Positive_KEGG_Enrichment.png
## │ ├── PC2_Positive_Molecular_Function.png
## │ ├── PC3
## │ │ ├── Negative
## │ │ │ ├── PC3_Negative_ALL_GO.csv
## │ │ │ └── PC3_Negative_KEGG.csv
## │ │ └── Positive
## │ │ ├── PC3_Positive_ALL_GO.csv
## │ │ └── PC3_Positive_KEGG.csv
## │ ├── PC3_Negative_Biological_Process.png
## │ ├── PC3_Negative_Cellular_Component.png
## │ ├── PC3_Negative_Functional_Enrichment.pdf
## │ ├── PC3_Negative_KEGG_Enrichment.png
## │ ├── PC3_Negative_Molecular_Function.png
## │ ├── PC3_Positive_Biological_Process.png
## │ ├── PC3_Positive_Cellular_Component.png
## │ ├── PC3_Positive_Functional_Enrichment.pdf
## │ ├── PC3_Positive_KEGG_Enrichment.png
## │ ├── PC3_Positive_Molecular_Function.png
## │ ├── PC4
## │ │ ├── Negative
## │ │ │ ├── PC4_Negative_ALL_GO.csv
## │ │ │ └── PC4_Negative_KEGG.csv
## │ │ └── Positive
## │ │ ├── PC4_Positive_ALL_GO.csv
## │ │ └── PC4_Positive_KEGG.csv
## │ ├── PC4_Negative_Biological_Process.png
## │ ├── PC4_Negative_Cellular_Component.png
## │ ├── PC4_Negative_Functional_Enrichment.pdf
## │ ├── PC4_Negative_KEGG_Enrichment.png
## │ ├── PC4_Negative_Molecular_Function.png
## │ ├── PC4_Positive_Biological_Process.png
## │ ├── PC4_Positive_Cellular_Component.png
## │ ├── PC4_Positive_Functional_Enrichment.pdf
## │ ├── PC4_Positive_KEGG_Enrichment.png
## │ ├── PC4_Positive_Molecular_Function.png
## │ ├── PC5
## │ │ ├── Negative
## │ │ │ ├── PC5_Negative_ALL_GO.csv
## │ │ │ └── PC5_Negative_KEGG.csv
## │ │ └── Positive
## │ │ ├── PC5_Positive_ALL_GO.csv
## │ │ └── PC5_Positive_KEGG.csv
## │ ├── PC5_Negative_Biological_Process.png
## │ ├── PC5_Negative_Cellular_Component.png
## │ ├── PC5_Negative_Functional_Enrichment.pdf
## │ ├── PC5_Negative_KEGG_Enrichment.png
## │ ├── PC5_Negative_Molecular_Function.png
## │ ├── PC5_Positive_Biological_Process.png
## │ ├── PC5_Positive_Cellular_Component.png
## │ ├── PC5_Positive_Functional_Enrichment.pdf
## │ ├── PC5_Positive_KEGG_Enrichment.png
## │ ├── PC5_Positive_Molecular_Function.png
## │ ├── PCA_3DPCA.pdf
## │ ├── PCA_biplot.pdf
## │ ├── PCA_loading_bar.pdf
## │ ├── PCA_loading_heat.pdf
## │ ├── PCA_overview.pdf
## │ ├── PCA_pairs_plot.pdf
## │ └── PCA_screen_plot.pdf
## └── SampleQC_dist_pcc.pdf
##
## 19 directories, 107 files
The data used here contains RNA-seq and ChIP-seq datasets from RUNX represses Pmp22 to drive neurofibromagenesis:
# prepare count matrix and metadata
debchip.count.file <- system.file("extdata", "debchip_count.txt", package = "DEbChIP")
debchip.meta.file <- system.file("extdata", "debchip_meta.txt", package = "DEbChIP")
debchip.count.matrix <- read.table(file = debchip.count.file, header = T, sep = "\t")
debchip.meta.info <- read.table(file = debchip.meta.file, header = T)
# create DESeqDataSet object
debchip.dds <- DESeq2::DESeqDataSetFromMatrix(countData = debchip.count.matrix,
colData = debchip.meta.info,
design = ~condition)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
# set control level
debchip.dds$condition <- relevel(debchip.dds$condition, ref = "NF")
# conduct differential expressed genes analysis
debchip.dds <- DESeq(debchip.dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
# extract results
debchip.dds.results <- results(debchip.dds,contrast=c("condition",'RX','NF'))
debchip.dds.results.ordered <- debchip.dds.results[order(debchip.dds.results$log2FoldChange,decreasing = TRUE),]
head(debchip.dds.results.ordered)
## log2 fold change (MLE): condition RX vs NF
## Wald test p-value: condition RX vs NF
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## Sycp1 12.07541 7.15056 3.91075 1.82844 6.74840e-02 1.81006e-01
## Gm16532 10.07020 6.89407 1.74281 3.95573 7.63014e-05 7.17146e-04
## Hs3st4 18.88787 6.82852 1.42952 4.77679 1.78117e-06 2.83848e-05
## Nell1 34.13399 6.80718 1.34257 5.07027 3.97260e-07 7.79916e-06
## Sptssb 9.39376 6.78558 1.52948 4.43651 9.14275e-06 1.16684e-04
## Myo3a 7.19992 6.41077 1.74865 3.66612 2.46259e-04 1.94151e-03
In this step, we will get consensus peaks with MSPC when multiple peak files are available, but when there is only one peak file, we will use it directly (make sure this peak file contains five columns: “chr”, “start”, “stop”, “name”, “score”).
# get consensus peak
peak.file = system.file("extdata", "debchip_peaks.bed", package = "DEbChIP")
peak.df = GetConsensusPeak(peak.file = peak.file)
head(peak.df)
## chr start stop name score
## 1 chr13 51519018 51519164 chr13-5 27.6
## 2 chr13 21326999 21327145 chr13-6 25.2
## 3 chr5 151112840 151112986 chr5-2 22.3
## 4 chr11 104361891 104362037 chr11-98 21.8
## 5 chr11 70198551 70198697 chr11-88 21.3
## 6 chr6 29326952 29327098 chr6-4 20.8
Check the profle of consensus peaks:
# peak profile plot
peak.profile = PeakProfile(peak.df ,species="Mouse", by = "gene", region.type = "body", nbin = 800)
## >> preparing promoter regions... 2022-06-30 22时08分15秒
## >> preparing tag matrix... 2022-06-30 22时08分16秒
## >> preparing start_site regions by ... 2022-06-30 22时08分16秒
## >> preparing tag matrix... 2022-06-30 22时08分16秒
## >> generating figure... 2022-06-30 22时08分27秒
## >> done... 2022-06-30 22时08分27秒
## >> binning method is used...2022-06-30 22时08分27秒
## >> preparing start_site regions by gene... 2022-06-30 22时08分27秒
## >> preparing tag matrix by binning... 2022-06-30 22时08分27秒
## >> Running bootstrapping for tag matrix... 2022-06-30 22时08分33秒
## >> binning method is used...2022-06-30 22时08分34秒
## >> preparing body regions by gene... 2022-06-30 22时08分34秒
## >> preparing tag matrix by binning... 2022-06-30 22时08分34秒
## >> preparing matrix with extension from (TSS-20%)~(TTS+20%)... 2022-06-30 22时08分34秒
## >> 1 peaks(0.1536098%), having lengths smaller than 800bp, are filtered... 2022-06-30 22时08分37秒
## >> Running bootstrapping for tag matrix... 2022-06-30 22时09分15秒
peak.profile$profile.plot
Peak annotation with ChIPseeker:
# peak annotation
peak.anno = AnnoPeak(peak.df = peak.df,species = "Mouse",seq.style = "UCSC",up.dist = 20000,down.dist = 20000)
## >> preparing features information... 2022-06-30 22时09分17秒
## >> identifying nearest features... 2022-06-30 22时09分17秒
## >> calculating distance from peak to TSS... 2022-06-30 22时09分17秒
## >> assigning genomic annotation... 2022-06-30 22时09分17秒
## >> adding gene annotation... 2022-06-30 22时09分29秒
## 'select()' returned 1:many mapping between keys and columns
## >> assigning chromosome lengths 2022-06-30 22时09分29秒
## >> done... 2022-06-30 22时09分30秒
## Warning: Removed 6 rows containing non-finite values (stat_count).
peak.anno.df = peak.anno$df
head(peak.anno.df)
## seqnames start end width strand name score
## 1 chr13 51519019 51519164 146 * chr13-5 27.6
## 2 chr13 21327000 21327145 146 * chr13-6 25.2
## 3 chr5 151112841 151112986 146 * chr5-2 22.3
## 4 chr11 104361892 104362037 146 * chr11-98 21.8
## 5 chr11 70198552 70198697 146 * chr11-88 21.3
## 6 chr6 29326953 29327098 146 * chr6-4 20.8
## annotation geneChr geneStart
## 1 Intron (ENSMUST00000021898.5/20418, intron 1 of 11) 13 51431041
## 2 Promoter (9-10kb) 13 21317258
## 3 Promoter (4-5kb) 5 151095421
## 4 Intron (ENSMUST00000106977.7/76719, intron 4 of 13) 11 104334921
## 5 Promoter (14-15kb) 11 70212752
## 6 Promoter (7-8kb) 6 29319199
## geneEnd geneLength geneStrand geneId transcriptId distanceToTSS
## 1 51567084 136044 2 20418 ENSMUST00000021898.5 47920
## 2 21319624 2367 1 75512 ENSMUST00000136668.1 9742
## 3 151108735 13315 2 243362 ENSMUST00000202866.1 -4106
## 4 104341299 6379 2 76719 ENSMUST00000069188.6 -20593
## 5 70216413 3662 1 216867 ENSMUST00000126388.7 -14055
## 6 29335854 16656 1 330277 ENSMUST00000166462.1 7754
## ENSEMBL SYMBOL
## 1 ENSMUSG00000021448 Shc3
## 2 ENSMUSG00000004341 Gpx6
## 3 ENSMUSG00000016128 Stard13
## 4 ENSMUSG00000018412 Kansl1
## 5 ENSMUSG00000040938 Slc16a11
## 6 ENSMUSG00000039742 Fam71f1
## GENENAME
## 1 src homology 2 domain-containing transforming protein C3
## 2 glutathione peroxidase 6
## 3 StAR-related lipid transfer (START) domain containing 13
## 4 KAT8 regulatory NSL complex subunit 1
## 5 solute carrier family 16 (monocarboxylic acid transporters), member 11
## 6 family with sequence similarity 71, member F1
## anno
## 1 Intron
## 2 Promoter
## 3 Promoter
## 4 Intron
## 5 Promoter
## 6 Promoter
peak.anno$plots
In this step, we will integrate ChIP-seq and RNA-seq to get plausible direct targets of promoter (in this example, Runx). ### Integrate
debchip.res = DEbChIP(de.res = debchip.dds.results,chip.res = peak.anno.df,chip.anno.key = "Promoter", merge.key="SYMBOL")
## Differential expression analysis with DESeq2!
head(debchip.res)
## SYMBOL geneId annotation anno ENSEMBL
## 1 0610012G03Rik <NA> <NA> <NA> <NA>
## 2 1110002J07Rik 68488 Promoter (12-13kb) Promoter <NA>
## 3 1110008P14Rik <NA> <NA> <NA> <NA>
## 4 1110032F04Rik <NA> <NA> <NA> <NA>
## 5 1500009C09Rik <NA> <NA> <NA> <NA>
## 6 1500011B03Rik <NA> <NA> <NA> <NA>
## GENENAME log2FoldChange abundance signif regulation
## 1 <NA> 1.837668 59.179193 3.073773 Up_regulated
## 2 RIKEN cDNA 1110002J07 gene NA NA NA <NA>
## 3 <NA> 1.013546 125.765543 2.277566 Up_regulated
## 4 <NA> 2.935223 89.091244 4.714179 Up_regulated
## 5 <NA> 3.183848 9.432253 1.492741 Up_regulated
## 6 <NA> 1.058984 227.077738 2.656922 Up_regulated
## Type
## 1 UP
## 2 ChIP
## 3 UP
## 4 UP
## 5 UP
## 6 UP
# DE and ChIP venn plot
debchip.plot = PlotDEbChIP(debchip.res,show_percentage=FALSE)
debchip.plot
# functional enrichment on direct targets
debchip.fe.results = DEbChIPFE(de.chip = debchip.res,gene.type = "ENTREZID",species="Mouse",save = F)
## conduct ALL GO enrichment analysis on: UPbChIP
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## conduct ALL GO enrichment analysis on: DOWNbChIP
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## wrong orderBy parameter; set to default `orderBy = "x"`
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
up.debchip.fe.results=debchip.fe.results[["UPbChIP"]][["GO"]]
head(up.debchip.fe.results[["table"]])
## ONTOLOGY ID Description
## GO:0032288 BP GO:0032288 myelin assembly
## GO:0019233 BP GO:0019233 sensory perception of pain
## GO:0043951 BP GO:0043951 negative regulation of cAMP-mediated signaling
## GO:0022011 BP GO:0022011 myelination in peripheral nervous system
## GO:0032292 BP GO:0032292 peripheral nervous system axon ensheathment
## GO:0014044 BP GO:0014044 Schwann cell development
## GeneRatio BgRatio pvalue p.adjust qvalue
## GO:0032288 2/21 23/23328 0.0001930551 0.02159856 0.01416299
## GO:0019233 3/21 146/23328 0.0002940462 0.02159856 0.01416299
## GO:0043951 2/21 30/23328 0.0003306741 0.02159856 0.01416299
## GO:0022011 2/21 31/23328 0.0003532875 0.02159856 0.01416299
## GO:0032292 2/21 31/23328 0.0003532875 0.02159856 0.01416299
## GO:0014044 2/21 34/23328 0.0004255310 0.02159856 0.01416299
## geneID Count
## GO:0032288 Pmp22/Prx 2
## GO:0019233 Kcnip3/Npy2r/Prx 3
## GO:0043951 Npy2r/Rnf157 2
## GO:0022011 Pmp22/Prx 2
## GO:0032292 Pmp22/Prx 2
## GO:0014044 Pmp22/Prx 2
up.debchip.fe.results[["plot"]]
down.debchip.fe.results=debchip.fe.results[["DOWNbChIP"]][["GO"]]
head(down.debchip.fe.results[["table"]])
## ONTOLOGY ID Description
## GO:0010810 BP GO:0010810 regulation of cell-substrate adhesion
## GO:0045785 BP GO:0045785 positive regulation of cell adhesion
## GO:0007162 BP GO:0007162 negative regulation of cell adhesion
## GO:0030198 BP GO:0030198 extracellular matrix organization
## GO:0043062 BP GO:0043062 extracellular structure organization
## GO:0051250 BP GO:0051250 negative regulation of lymphocyte activation
## GeneRatio BgRatio pvalue p.adjust qvalue
## GO:0010810 5/33 214/23328 1.193628e-05 0.01428434 0.008185655
## GO:0045785 6/33 435/23328 2.935433e-05 0.01428434 0.008185655
## GO:0007162 5/33 290/23328 5.117751e-05 0.01428434 0.008185655
## GO:0030198 5/33 302/23328 6.201521e-05 0.01428434 0.008185655
## GO:0043062 5/33 303/23328 6.299252e-05 0.01428434 0.008185655
## GO:0051250 4/33 158/23328 7.112535e-05 0.01428434 0.008185655
## geneID Count
## GO:0010810 Col8a1/Fbln2/Mmp14/Postn/Sdc4 5
## GO:0045785 Col8a1/Fbln2/Il4ra/Ptpn22/Sdc4/Tnfsf13b 6
## GO:0007162 Il4ra/Mmp14/Postn/Ptpn22/Sdc4 5
## GO:0030198 Bcl3/Col8a1/Fbln2/Mmp14/Postn 5
## GO:0043062 Bcl3/Col8a1/Fbln2/Mmp14/Postn 5
## GO:0051250 Il4ra/Lyn/Ptpn22/Sdc4 4
down.debchip.fe.results[["plot"]]
# convert for DESeq2 results
dds.results.ordered <- IDConversion(deres = dds.results.ordered, gene.type = "ENSEMBL",sort.key = "log2FoldChange")
## 'select()' returned 1:many mapping between keys and columns
head(dds.results.ordered)
## baseMean log2FoldChange lfcSE stat
## ENSMUSG00000000001.4 4178.2932143 0.03256101 0.1097033 0.2968099
## ENSMUSG00000000003.15 0.0000000 NA NA NA
## ENSMUSG00000000028.14 416.5975492 -0.13238461 0.1709810 -0.7742650
## ENSMUSG00000000031.16 200.5762259 0.57949970 0.7478555 0.7748819
## ENSMUSG00000000037.16 202.0237326 0.13726485 0.2659695 0.5160925
## ENSMUSG00000000049.11 0.3930718 -2.07400983 2.6239287 -0.7904216
## pvalue padj ENTREZID SYMBOL
## ENSMUSG00000000001.4 0.7666117 0.9997304 14679 Gnai3
## ENSMUSG00000000003.15 NA NA 54192 Pbsn
## ENSMUSG00000000028.14 0.4387741 0.9997304 12544 Cdc45
## ENSMUSG00000000031.16 0.4384095 0.9997304 14955 H19
## ENSMUSG00000000037.16 0.6057898 0.9997304 107815 Scml2
## ENSMUSG00000000049.11 0.4292816 0.9997304 11818 Apoh
count.matrix <- IDConversion(deres = count.matrix, gene.type = "ENSEMBL",sort.key=NULL)
## 'select()' returned 1:many mapping between keys and columns
head(count.matrix)
## KO1 KO2 KO3 KO4 KO5 KO6 WT1 WT2 WT3 WT4 WT5
## ENSMUSG00000000001.4 4556 4218 3835 3718 5741 3875 4115 5074 2931 4374 5118
## ENSMUSG00000000003.15 0 0 0 0 0 0 0 0 0 0 0
## ENSMUSG00000000028.14 350 579 435 316 432 317 245 621 419 506 545
## ENSMUSG00000000031.16 268 804 66 207 46 66 336 112 60 69 137
## ENSMUSG00000000037.16 262 157 184 162 301 233 311 176 94 139 229
## ENSMUSG00000000049.11 0 0 0 0 0 0 1 0 2 0 1
## WT6 ENTREZID SYMBOL
## ENSMUSG00000000001.4 3625 14679 Gnai3
## ENSMUSG00000000003.15 0 54192 Pbsn
## ENSMUSG00000000028.14 371 12544 Cdc45
## ENSMUSG00000000031.16 193 14955 H19
## ENSMUSG00000000037.16 186 107815 Scml2
## ENSMUSG00000000049.11 0 11818 Apoh
Here, we provide five different normalization methods: CPM, TMM, DESeq2’s median of ratios, RPKM and TPM, of which RPKM and TPM need to provide gtf files to calculate gene length.
Counts Per Million reads mapped (CPM) takes into account the sequencing depth. For each feature i, CPM is the count of sequenced fragments mapping to the feature scaled by the total number of reads times one million (to bring it up to a more convenient number).
\[ \mathrm{CPM}=\frac{\text { Number of reads mapped to gene } \times 10^{6}}{\text { Total number of mapped reads }} \]
cpm.matrix=NormalizedCount(dds, norm.type="CPM")
head(cpm.matrix)
## KO1 KO2 KO3 KO4 KO5
## ENSMUSG00000000001.4 405.92819 400.64944 389.764529 377.62444 380.625026
## ENSMUSG00000000003.15 0.00000 0.00000 0.000000 0.00000 0.000000
## ENSMUSG00000000028.14 31.18412 54.99669 44.210579 32.09503 28.641354
## ENSMUSG00000000031.16 23.87813 76.36846 6.707812 21.02428 3.049774
## ENSMUSG00000000037.16 23.34354 14.91275 18.700567 16.45378 19.956128
## ENSMUSG00000000049.11 0.00000 0.00000 0.000000 0.00000 0.000000
## KO6 WT1 WT2 WT3 WT4
## ENSMUSG00000000001.4 362.974809 424.0865495 343.046563 389.7566993 402.307594
## ENSMUSG00000000003.15 0.000000 0.0000000 0.000000 0.0000000 0.000000
## ENSMUSG00000000028.14 29.693681 25.2493814 41.985005 55.7175220 46.540385
## ENSMUSG00000000031.16 6.182281 34.6277231 7.572175 7.9786428 6.346416
## ENSMUSG00000000037.16 21.825324 32.0512556 11.899132 12.4998737 12.784809
## ENSMUSG00000000049.11 0.000000 0.1030587 0.000000 0.2659548 0.000000
## WT5 WT6
## ENSMUSG00000000001.4 383.64870911 355.77287
## ENSMUSG00000000003.15 0.00000000 0.00000
## ENSMUSG00000000028.14 40.85356516 36.41151
## ENSMUSG00000000031.16 10.26961179 18.94184
## ENSMUSG00000000037.16 17.16599343 18.25483
## ENSMUSG00000000049.11 0.07496067 0.00000
edgeR’s Trimmed Mean of M values (TMM) takes into account the sequencing depth, RNA composition, and gene length. For detailed information, please refer to A scaling normalization method for differential expression analysis of RNA-seq data.
tmm.matrix=NormalizedCount(dds, norm.type="TMM")
head(tmm.matrix)
## KO1 KO2 KO3 KO4 KO5
## ENSMUSG00000000001.4 421.83331 371.93907 371.189992 382.69198 413.745541
## ENSMUSG00000000003.15 0.00000 0.00000 0.000000 0.00000 0.000000
## ENSMUSG00000000028.14 32.40598 51.05565 42.103689 32.52573 31.133613
## ENSMUSG00000000031.16 24.81372 70.89592 6.388146 21.30641 3.315153
## ENSMUSG00000000037.16 24.25819 13.84410 17.809376 16.67458 21.692633
## ENSMUSG00000000049.11 0.00000 0.00000 0.000000 0.00000 0.000000
## KO6 WT1 WT2 WT3 WT4
## ENSMUSG00000000001.4 373.041177 486.1604573 334.563934 361.3055744 394.46383
## ENSMUSG00000000003.15 0.000000 0.0000000 0.000000 0.0000000 0.00000
## ENSMUSG00000000028.14 30.517175 28.9451548 40.946828 51.6503022 45.63299
## ENSMUSG00000000031.16 6.353734 39.6962123 7.384935 7.3962247 6.22268
## ENSMUSG00000000037.16 22.430605 36.7426251 11.604898 11.5874186 12.53554
## ENSMUSG00000000049.11 0.000000 0.1181435 0.000000 0.2465408 0.00000
## WT5 WT6
## ENSMUSG00000000001.4 371.61142274 347.48647
## ENSMUSG00000000003.15 0.00000000 0.00000
## ENSMUSG00000000028.14 39.57175174 35.56344
## ENSMUSG00000000031.16 9.94739447 18.50066
## ENSMUSG00000000037.16 16.62739660 17.82965
## ENSMUSG00000000049.11 0.07260872 0.00000
DESeq2’s median of ratios takes into account the sequencing depth and RNA composition. For detailed information, please refer to Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
deseq2.matrix=NormalizedCount(dds, norm.type="DESeq2")
head(deseq2.matrix)
## KO1 KO2 KO3 KO4 KO5
## ENSMUSG00000000001.4 4606.4055 4014.0575 4037.53736 4145.2746 4476.42605
## ENSMUSG00000000003.15 0.0000 0.0000 0.00000 0.0000 0.00000
## ENSMUSG00000000028.14 353.8722 551.0050 457.97360 352.3149 336.84307
## ENSMUSG00000000031.16 270.9650 765.1262 69.48565 230.7886 35.86755
## ENSMUSG00000000037.16 264.8986 149.4090 193.71757 180.6171 234.69853
## ENSMUSG00000000049.11 0.0000 0.0000 0.00000 0.0000 0.00000
## KO6 WT1 WT2 WT3 WT4
## ENSMUSG00000000001.4 4069.51479 5285.859380 3591.55792 3889.896093 4266.4884
## ENSMUSG00000000003.15 0.00000 0.000000 0.00000 0.000000 0.0000
## ENSMUSG00000000028.14 332.91257 314.710947 439.56592 556.078630 493.5627
## ENSMUSG00000000031.16 69.31303 431.603585 79.27759 79.629398 67.3040
## ENSMUSG00000000037.16 244.69599 399.490223 124.57907 124.752724 135.5834
## ENSMUSG00000000049.11 0.00000 1.284534 0.00000 2.654313 0.0000
## WT5 WT6
## ENSMUSG00000000001.4 3981.8764585 3774.6246
## ENSMUSG00000000003.15 0.0000000 0.0000
## ENSMUSG00000000028.14 424.0177159 386.3133
## ENSMUSG00000000031.16 106.5879396 200.9662
## ENSMUSG00000000037.16 178.1652421 193.6773
## ENSMUSG00000000049.11 0.7780142 0.0000
Reads Per Kilobase of exon per Million reads mapped (RPKM) takes into account the sequencing depth and gene length. For each feature i as the count scaled by the feature’s length times one thousand (to kilobase) and further scaled by the total number of reads times one million.
\[ \text { RPKM }=\frac{\text { Number of reads mapped to gene } \times 10^{3} \times 10^{6}}{\text { Total number of mapped reads } \times \text { gene length }} \]
rpkm.matrix=NormalizedCount(dds,gtf.file = '/path/to/gtf', norm.type="RPKM")
Transcripts Per kilobase Million (TPM) takes into account the sequencing depth and gene length.
\[ \mathrm{TPM}=\frac{\frac{\text { total reads mapped to gene } \times 10^{3}}{\text { gene length }}}{\sum(\frac{\text { total reads mapped to gene } \times 10^{3}}{\text { gene length }})} \times 10^{6} \]
TPM is proportional to RPKM:
\[ \mathrm{TPM}=\frac{R P K M}{\sum(R P K M)} \times 10^{6} \]
tpm.matrix=NormalizedCount(dds,gtf.file = '/path/to/gtf', norm.type="TPM")
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /home/softwares/anaconda3/envs/r4.0/lib/libopenblasp-r0.3.12.so
##
## locale:
## [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0
## [2] GenomicFeatures_1.42.2
## [3] msigdbr_7.5.1
## [4] ggrepel_0.9.1
## [5] ggplot2_3.3.5
## [6] edgeR_3.32.1
## [7] limma_3.46.0
## [8] org.Mm.eg.db_3.12.0
## [9] AnnotationDbi_1.52.0
## [10] DESeq2_1.30.1
## [11] SummarizedExperiment_1.20.0
## [12] Biobase_2.50.0
## [13] MatrixGenerics_1.2.1
## [14] matrixStats_0.58.0
## [15] GenomicRanges_1.42.0
## [16] GenomeInfoDb_1.26.4
## [17] IRanges_2.24.1
## [18] S4Vectors_0.28.1
## [19] BiocGenerics_0.42.0
## [20] DEbChIP_0.1.0
## [21] knitr_1.37
## [22] BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] ggvenn_0.1.9
## [2] utf8_1.2.1
## [3] tidyselect_1.1.0
## [4] RSQLite_2.2.5
## [5] grid_4.0.3
## [6] BiocParallel_1.24.1
## [7] scatterpie_0.1.7
## [8] munsell_0.5.0
## [9] ragg_0.4.0
## [10] codetools_0.2-18
## [11] withr_2.4.1
## [12] misc3d_0.9-1
## [13] colorspace_2.0-0
## [14] GOSemSim_2.16.1
## [15] highr_0.8
## [16] rstudioapi_0.13
## [17] robustbase_0.95-0
## [18] DOSE_3.16.0
## [19] labeling_0.4.2
## [20] GenomeInfoDbData_1.2.4
## [21] polyclip_1.10-0
## [22] bit64_4.0.5
## [23] farver_2.1.0
## [24] downloader_0.4
## [25] vctrs_0.3.7
## [26] generics_0.1.0
## [27] xfun_0.30
## [28] BiocFileCache_1.14.0
## [29] R6_2.5.0
## [30] doParallel_1.0.16
## [31] clue_0.3-59
## [32] graphlayouts_0.7.1
## [33] rsvd_1.0.3
## [34] locfit_1.5-9.4
## [35] gridGraphics_0.5-1
## [36] bitops_1.0-6
## [37] cachem_1.0.4
## [38] fgsea_1.16.0
## [39] DelayedArray_0.16.3
## [40] assertthat_0.2.1
## [41] scales_1.1.1
## [42] ggraph_2.0.5
## [43] enrichplot_1.10.2
## [44] gtable_0.3.0
## [45] beachmat_2.6.4
## [46] Cairo_1.5-12.2
## [47] sva_3.38.0
## [48] tidygraph_1.2.0
## [49] rlang_0.4.10
## [50] genefilter_1.72.1
## [51] systemfonts_1.0.1
## [52] GlobalOptions_0.1.2
## [53] splines_4.0.3
## [54] rtracklayer_1.50.0
## [55] checkmate_2.0.0
## [56] BiocManager_1.30.16
## [57] yaml_2.2.1
## [58] reshape2_1.4.4
## [59] backports_1.2.1
## [60] qvalue_2.22.0
## [61] clusterProfiler_3.18.1
## [62] tcltk_4.0.3
## [63] tools_4.0.3
## [64] bookdown_0.26
## [65] ggplotify_0.1.0
## [66] ellipsis_0.3.1
## [67] gplots_3.1.1
## [68] jquerylib_0.1.3
## [69] RColorBrewer_1.1-2
## [70] Rcpp_1.0.6
## [71] plyr_1.8.6
## [72] sparseMatrixStats_1.2.1
## [73] progress_1.2.2
## [74] zlibbioc_1.36.0
## [75] purrr_0.3.4
## [76] RCurl_1.98-1.3
## [77] prettyunits_1.1.1
## [78] openssl_1.4.3
## [79] GetoptLong_1.0.5
## [80] viridis_0.6.1
## [81] cowplot_1.1.1
## [82] cluster_2.1.1
## [83] magrittr_2.0.1
## [84] data.table_1.14.2
## [85] DO.db_2.9
## [86] circlize_0.4.15
## [87] mvtnorm_1.1-2
## [88] patchwork_1.0.0
## [89] hms_1.0.0
## [90] evaluate_0.14
## [91] xtable_1.8-4
## [92] XML_3.99-0.6
## [93] gridExtra_2.3
## [94] shape_1.4.6
## [95] ggupset_0.3.0
## [96] compiler_4.0.3
## [97] biomaRt_2.46.3
## [98] tibble_3.1.0
## [99] KernSmooth_2.23-18
## [100] crayon_1.4.1
## [101] shadowtext_0.0.9
## [102] htmltools_0.5.2
## [103] mgcv_1.8-34
## [104] pcaPP_2.0-1
## [105] ggfun_0.0.4
## [106] rrcov_1.7-0
## [107] tidyr_1.1.3
## [108] geneplotter_1.68.0
## [109] DBI_1.1.1
## [110] tweenr_1.0.2
## [111] ChIPseeker_1.33.0.900
## [112] dbplyr_2.1.1
## [113] ComplexHeatmap_2.13.1
## [114] MASS_7.3-53.1
## [115] rappdirs_0.3.3
## [116] boot_1.3-28
## [117] babelgene_21.4
## [118] Matrix_1.3-3
## [119] cli_2.4.0
## [120] parallel_4.0.3
## [121] igraph_1.2.6
## [122] pkgconfig_2.0.3
## [123] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [124] rvcheck_0.1.8
## [125] GenomicAlignments_1.26.0
## [126] xml2_1.3.2
## [127] foreach_1.5.1
## [128] PCAtools_2.2.0
## [129] annotate_1.68.0
## [130] bslib_0.3.1
## [131] dqrng_0.2.1
## [132] DEFormats_1.18.0
## [133] XVector_0.30.0
## [134] yulab.utils_0.0.4
## [135] stringr_1.4.0
## [136] digest_0.6.27
## [137] Biostrings_2.58.0
## [138] rmarkdown_2.14
## [139] fastmatch_1.1-3
## [140] DelayedMatrixStats_1.12.3
## [141] curl_4.3
## [142] Rsamtools_2.6.0
## [143] gtools_3.8.2
## [144] NOISeq_2.34.0
## [145] rjson_0.2.20
## [146] nlme_3.1-152
## [147] lifecycle_1.0.0
## [148] jsonlite_1.7.2
## [149] viridisLite_0.4.0
## [150] askpass_1.1
## [151] fansi_0.4.2
## [152] pillar_1.5.1
## [153] lattice_0.20-45
## [154] DEoptimR_1.0-11
## [155] fastmap_1.1.0
## [156] httr_1.4.2
## [157] plotrix_3.8-2
## [158] survival_3.2-10
## [159] GO.db_3.12.1
## [160] glue_1.4.2
## [161] png_0.1-7
## [162] iterators_1.0.13
## [163] plot3D_1.4
## [164] bit_4.0.4
## [165] ggforce_0.3.3
## [166] stringi_1.5.3
## [167] sass_0.4.1
## [168] blob_1.2.1
## [169] textshaping_0.1.2
## [170] BiocSingular_1.6.0
## [171] caTools_1.18.2
## [172] memoise_2.0.0
## [173] dplyr_1.0.5
## [174] irlba_2.3.5